📌 Exploratory Data Analysis (EDA)

/tmp/ipykernel_13146/3946722459.py:4: DtypeWarning: Columns (19,30) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv("/home/ubuntu/.ssh/ad688-employability-sp25A1-group1-4/lightcast_jobs_postings.csv")


#| eval: false
#| echo: false

df = df.drop_duplicates(subset=["TITLE", "COMPANY", "LOCATION", "POSTED"], keep="first")

#| eval: false
#| echo: false

df.dropna(thresh=len(df) * 0.5, axis=1, inplace=True)

# Fill missing values safely
df["NAICS_2022_6"] = df["NAICS_2022_6"].fillna(df["NAICS_2022_6"].median())
df["NAICS_2022_6_NAME"] = df["NAICS_2022_6_NAME"].fillna("Unknown")

Top 15 Job Posting Industries

Top 15 Job Posting Industries

In our analysis, we looked at the top industries with the most job postings to get a sense of where the demand is highest. The pie chart shows that 22.6% of the postings fall under the Unclassified Industry category, suggesting that many roles either span multiple sectors or lack clear classification. This made us consider the potential limitations in how industries are labeled in the data.

We also noticed a strong demand for skills in technology and consulting. For example, Custom Computer Programming Services made up 12.1% of the postings, while Management Consulting Services accounted for 11.3%. This highlights a significant need for both tech and management skills in today’s job market.

Interestingly, several tech-focused industries, such as Software Publishers and Computer Systems Design Services, showed up prominently in the chart. This aligns with the growing demand for IT and consulting professionals, which didn’t surprise us given the ongoing digital transformation across industries.

We also found that finance and healthcare sectors have a notable share of job postings, indicating steady demand in these fields. On the flip side, areas like Accounting and Temporary Help Services had fewer listings, suggesting these might be more niche markets.


Remote vs. On-Site Jobs (Data Roles)

Remote vs. On-Site Jobs

We created a Pie Chart to explor the distribution of remote, on-site, and hybrid roles for data-related jobs. The chart shows that 73.8% of postings do not specify a preference, suggesting either data gaps or employer flexibility. Among specified roles, 20.4% are remote, indicating a strong demand for remote work. Hybrid roles account for 4%, while fully on-site roles are only 1.79%.

These findings suggest a clear shift towards remote and flexible work arrangements in the data field. Highlighting remote work skills could be advantageous for job seekers in this area.


📌 Key Findings

  1. Industry Demand:
  • Programming Services, Consulting Services, and Insurance emerged as leading industries for data-related roles, indicating a widespread need for data skills beyond traditional tech sectors.
  • The prominence of Unclassified Industry suggests potential gaps in data classification or a diverse range of roles that do not fit into conventional categories.
  1. Geographical Distribution:
  • California, Texas, and Florida were identified as major hubs for data-related jobs, while several Midwestern and Mountain states showed fewer opportunities.
  • This pattern suggests that professionals may find more job prospects by focusing on these high-demand states.
  1. Work Arrangement Preferences:
  • A significant share of job postings preferred remote and hybrid roles, with over 20% specifically offering remote options.
  • The limited proportion of fully on-site roles reflects a broader shift towards flexible work models in the data industry.
  1. Role-Specific Trends:
  • There is a clear upward trend in demand for roles such as Big Data Analysts and Business Intelligence Analysts, highlighting the growing importance of both technical and analytical skills.
  • Specialized roles like Clinical Data Analysts and Customer Data Analysts are also gaining traction, indicating expanding opportunities in niche areas.

📌 Conclusion

Our analysis revealed that the demand for data-related skills is both substantial and diverse, spanning multiple industries and regions across the United States. Key sectors such as tech, consulting, and insurance show the most significant opportunities, while states like California, Texas, and Florida lead in job postings.

The strong preference for remote and hybrid roles highlights the importance of flexibility in the current job market. Meanwhile, the consistent rise in demand for specialized data roles suggests a promising outlook for professionals equipped with both analytical and domain-specific skills.

Overall, these findings suggest that focusing on high-demand industries, enhancing remote work capabilities, and acquiring specialized skills can significantly boost job prospects in the data field.